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Abstract 

Chang's Lemma is a widely employed result in additive combinatorics. It gives bounds 
on the dimension of the large spectrum of probability distributions on finite abelian groups. 
Recently, Bloom (2016) presented a powerful variant of Chang's Lemma that yields the strongest 
known quantitative version of Roth's theorem on 3-term arithmetic progressions in dense 
subsets of the integers. In this note, we show how such theorems can be derived from the 
approximation of probability measures via entropy maximization. 


1 Introduction 


Let G be a finite abelian group. Chang's Lemma [Cha02] asserts that, for every large subset 
S c G, the large Fourier coefficients of the indicator function Ig lie in a low-dimensional subspace. 
This has seen a number of applications in additive combinatorics (in addition to Chang's original 
application to Freiman's theorem). 

A theorem of Bloom [BI0I6] shows that a large subset of the large spectrum can be contained 
in an even lower-dimensional subspace. We refer to Section 3 for the formal statements. Bloom 
employs his theorem as the key tool in obtaining the following quantitative version of Roth's 
theorem. 


Theorem 1.1. There exists a c > 0 such that for all sujficiently large N, the following holds: If A c {1, ...,N} 
contains no non-trivial three-term arithmetic progression, then 


|A|<c 


(loglogN)\ 

logN 


This improves slightly over Sanders' [Sanll] breakthrough result that has (log log N)^ replaced 
by (log log N)^. 

In this note, we state a general approximation theorem for probability measures on finite 
spaces equipped with no algebraic structure. From this theorem. Bloom's result follows easily. 
While Bloom's proof uses the additive structure in a seemingly fundamental and intricate way, our 
argument is elementary and requires only a direct application of the fact that the characters of a 
finite abelian group are homomorphisms and bounded in foo- 

The statement and proof are inspired by the "entropy maximization" philosophy: Given a 
probability measure p and a collection of linear observables !F, one can find a "simple" approxi¬ 
mator p (with respect to T) by maximizing the entropy of p over all probability measures having 
similar behavior on 
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Our use of this philosophy is motivated by the work [LRS15] where it is employed in the 
setting of quantum states and von Neumann entropy. In [IMR14], the authors use a simple 
entropy argument to prove the special case of Chang's Lemma when G = F”. The entropy- 
maximization approach is also related, at least in spirit, to the works [GowlO] and [RTTV08] on 
"dense model theorems," and to a long line of works employing an "entropy regularizer" in the 
setting of convex optimization. For a discussion of these connections, additional applications of 
our sparse approximation theorem, and further accounts of the use of relative entropy in additive 
combinatorics, we refer to the forthcoming paper of Wolf [Woll7]. 

In the next section, we state and prove an approximation theorem in the context of finite 
probability spaces. In Section 3, we prove the results of Bloom and Chang. 


2 An approximation theorem 


Let X be a finite set equipped with a probability measure p. We use L^(/r) to denote the Hilbert 
space of real-valued functions on X equipped with inner product {f,g) = JLxeX /^(^)/(^)^{^)- For 
a function h : X ^ 'R, we will use the notation = Tjxex also denote by \\h\\p = 

the Lf’(p) norm for p > 1. 

Denote the set of densities with respect to p by Ax = {/ : X ^ [0, oo) : ||/||^ = 1 }. For / e Ax, 
define the relative entropy 

Entf,(/) = Ef,[/log/]. 

We will also use the notion of the relative entropy between two densities h, h' e Ax- 


nJh\\h') = R, 




This definition makes sense whenever suppQi) c supp(//'). Otherwise, we take the value to be -i-oo. 

Generalized Riesz products. Suppose that c L^(p) is a collection satisfying sup^g^ l|(pl|co < 1. 
Define the semi-norm \\f\\f- = sup^g^ |((p,/)|. Say that a function R € L^(p) is a degree-d Riesz 
T-product if 

d 

R{x) = + £i(Pi(x)) 

i=l 

for some d > 1 and cpi ,..., (p^ e !F, £i,..., € {-1,0,1}. Observe that every such R is non-negative 

on X. 


Theorem 2.1 (Sparse approximation theorem). For every 0 < i] < \ and f e Ax, there is a y e Ax 
such that: 


1 . 

2. 


\\f-9\\r<h- 

There is a subset T' with 


\T'\ < 9 


Ent^,(/) 


and such that g is a non-negative linear combination of degree-d Riesz 9^'-products for 


Ent,,(/) 

d < 12-+ O 


log log 7 




( 2 . 1 ) 


( 2 . 2 ) 
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While Theorem 2.1 yields a result that is closely related to Chang's Lemma and is sufficient for 
the case G = F”, it seems that a more delicate property is required to recover the full statement. 
Say that the family T is Laplace pseudorandom if for every collection {Aq, : cp e 'f'} oi real numbers, 
the following property holds: 


logEf, 


exp 


Y 


\(per 



tpeT 


(2.3) 


Lemma 2.2. IfT' is Laplace pseudorandom then for any f e Ax, it holds that 

< 2Entf,(/). 

(peT 


2.1 Duality theory for relative entropy minimization 

Lemma 2.2 and part of Theorem 2.1 can be proved using only elementary properties of duality 
for optimization of convex functions over polytopes. Establishing the bound (2.1) will require an 
iterative algorithm described in Section 2.3. 

Eix some / e Ax, a finite collection To Q L^(/r), and a parameter 6 > 0. Consider the optimiza¬ 
tion: 


mininize Ent^Y^) (2-4) 

subject to ^ e Ax 

{y,(p)>(f,(p)-6 'icp^To- 


Note that we are minimizing a strongly convex function over a non-empty, compact polytope 
(since / itself satisfies all the constraints), and thus (2.4) has a unique optimal solution. The 
corresponding dual optimization is 


maximize 


-log 


E 


^exp 


( '''' 


+ L A^«/,(P)-6) 

(peTo 


(2.5) 


subject to Aq, > 0 Vcp e To . 


See, for instance, [BV04, §5.2.4]. 

Let P* and D* denote the optimal values of (2.4) and (2.5), respectively. By weak duality, the 
inequality P* > D* always holds. Let us use this fact to prove Lemma 2.2. 

Proof of Lemma 2.2. Consider the optimizations (2.4) and (2.5) with 6 = 0 and 

To = {sign«/,(p))(p :cpeT] 


so that (/, (p) > 0 for (p e To- Then by weak duality: 


Entf,(/) >P* >D* > - log 


Ep exp 


\\ 


Y^{f,(p)(p 


yipeTo 


ipeTo 


where the last inequality employs the feasible solution {Aq, = {f,(p} : cp e To}- 
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Using the assumption that T is Laplace pseudorandom, this yields 


Ent^i{/) > ^ 

(peT 


completing the proof. 


□ 


For 6 > 0, the optimization (2.4) is strictly feasible since (as witnessed by /), and hence Slater's 
theorem implies that strong duality holds and P* = D* (see, e.g., [BV04, §5.3.2]). In this case, the 
KKT conditions hold, i.e., the gradient of the Lagrangian is identically zero at the optimal solution. 
Let {g*, {A*(p}) denote the corresponding optimal primal-dual pair. The gradient condition yields 

, exp{L<pero^*<P^) 

“ -7-7- (2-6) 

,exp [LcpeTo^vV 

= E,[/log/] = D* + 6 a;, (2.7) 

ipeTo 

where the latter equality uses = 1. 

Lemma 2.3. For every 6 > 0, the optimal solution {A^j of (2.5) satisfies 


E, 


It follows that 


Ent^,(/) -D^,(/||/) 


Jit 

(peTo 


Entf,(/) 
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Proof. Note that D* > 0 because A^ = 0 is a feasible solution. Therefore (2.7) yields 

Ent^(/) - D^if II /) < Enfff ). □ 

(peTo 


2.2 Truncating the exponential 

Let us now move on to the proof of Theorem 2.1. 

Lemma 2.4. Suppose that ||(p|U ^ ^for (p ^ To Q L^{p). Consider non-negative numbers {c^ : (p € T'o] 
and 


h = exp 


J1 c^(l -t (p) 


Then for every 0 < rj < 3 , there is a density h e Ax that is a non-negative linear combination of degree-d 
Riesz To-products and such that 


d < 6 Cy -t O 

(peTo 


f log - i 


log log ^ 


n 


and 


Ef,// 


-h 




4 














Proof. Let ip = JLcpeTo each summand is non-negative (because ||(p||co < 1) 


\e^-pm{x)\ B'^+i 

x^m " ("^ +1)! 


and ||i/'||co < 2c, where c = ILtpeTo Denote pm{x) = JLj^m j recall from Taylor's thoerem that 
for B > 0, 

Let us choose m 3B + O 
yields 

Now define 


logj] 


log log; 


7 so as to make this quantity less than pjl. Thus setting B = 2c 


r Vni{^) 


( 2 . 8 ) 


lEf,pm(i/')' 

and note that B is a non-negative combination of degree-m Riesz ;To-products. Moreover, 


h 


E^/B 


B 




Pm{^) 


EpB Ef,B 


-I- 


Vmi-^p) 


Ef,B 


■B 


( 2 . 8 ) n 

^ V 


E^Pm(l/^) 


E^B 


( 2 . 8 ) 

< B- 


□ 


We first prove Theorem 2.1 without the sparsity constraint (2.1) since it follows easily from the 
machinery we already have. 

Theorem 2.5 (Low-degree approximation theorem). For every 0 < j] < 3 and f e Ax, there is a 
y € Ax such that: 

2- ll/-pllr< ']• 

2. g is a non-negative linear combination ofdegree-d Riesz -products for 


EntM(/) 

d < 12- — + O 


^ logl 1 


log log ^ 


(2.9) 


n 


Proof. Consider the optimization (2.4) with 6 = i]l2 and To = {±(p ■ (p e T}. Let (^*, {A^}) denote 
the corresponding optimal primal-dual pair and observe that 

, _ exp(Lyey-o'^y(l + <P)) 

Ef, exp (Lpgyj, A*^(l -h cp)) 


Moreover, Lemma 2.3 asserts that c = Lcpe-Fo ^ 

Thus we can apply Lemma 2.4 to obtain a density B € Ax that is a non-negative linear combi¬ 
nation of degree-d Riesz !To-products with 


Ent,,(/) 

d < 12-+ O 


log^ 

log log i 




and such that ||B - g*\\i < gjl. 

Einally, observe that for any cp eTrhy definition of the optimization (2.4), we have 

\{h-f,(p}\ < \{h-g\cp)\ -t \{g* -f,(p)\ < ||B-/||i + ^ < q, 

where in the second inequality we have used ||(p||co < 1- It follows that ||B - f\\f- < rj, completing 
the proof. □ 


5 



































2.3 Mirror descent 


We now prove Theorem 2.1 by giving an algorithm that approximately solves the optimization 
(2.4). The algorithm and analysis are based on the "mirror descent" framework, analyzed using a 
Bregman divergence (in this case, the relative entropy). See, for instance, the monograph [Bubl4]. 
The sparsity of the solution (captured by (2.1)) is closely related to sparsity properties of the 
Frank-Wolfe algorithm [FW56]. 

Assume that 7] > 0 and / e Ax are given as in the theorem. For some value T > 0, define a 
family {gt ■ t e [0, T]} c Ax by 


gt = 


E^, exp (ps ds j 


( 2 . 10 ) 


where s i-> (ps £ h^(p) is a measurable function to be specified shortly. Observe that go = Us the 
constant 1 function. 

A simple calculation yields: For t e [0, T), 


j^D^ifWgt) = {(pt,gt-f}. 


( 2 . 11 ) 


We define the maps s i-> cps to be piecewise constant on a finite sequence of intervals. Given 
the definition on intervals [0, fi), [fi, t ^),..., [f;-i, f;) with 0 < fi < f2 < • • • < f;, we define it on an 
interval [f/, f;+i) as follows. 

If there exists a functional (p such that 


then we put 


2rj 

\{gti,(p)-{f,(p}\ > y, 


cps = sign«/-pf,.,(p))-(p 


( 2 . 12 ) 


for s € [tf, ti+i) where t;+i = inf{f > t; : \{gt, (p) - (/, cp)\ < p/3}. We will see momentarily why such a 
f;+i must exist. 

If there is no such functional cp at time f;, then we set T = t, and fmax = Z- By construction, we 


have the property that \\f - prlly ^ |p- 


Lemma 2.6. T < 3 


Ent,i(/) 


Proof. Simply observe that for t e [0, T), the calculation (2.11) combined with the definition of the 
sequence {t,} and the choice (2.12) yields 

On the other hand, E)^(/1| po) = Ent^,(/) and E)^(/1| pf) > 0 is always true. This yields the claim. □ 


Lemma 2.7. It holds that fmax < 9 


Ent,,(/) 


Proof. Fix an interval [b-i, f/) with i < imax- Let cp = (pti_i- We calculate 


= -((p^gtip - {p,gt))) = -{(p ,gt) + {(p,gt) ■ 

Notice that the latter quantity is at least -||(p||^||pt||i > -1. Therefore t, - We conclude that 

^max < 3T/p and combine this with Lemma 2.6. □ 
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Observe now that 


^ (2.13) 

Ef, exp I Jq (1 + cps) ds 1 

and Wf-grWr^ 2r7/3. 

Note that if we set 

'F' = [(p e T : (p = ±(pt for some t e [0, T]} , 

then Lemma 2.7 yields \F'\ < xhe proof of Theorem 2.1 is concluded using Lemma 2.4 in 

conjunction with Lemma 2.6, just as in the proof of Theorem 2.5. 


3 Covering the large spectrum 

Let G be a finite abelian group equipped with the uniform measure p, and let G be the dual group. 
Let 0 denote the identity element in G and G. 

For y E G, let Uy : G —> C denote the corresponding character. One can write any / : G —> C as 
/ = Hyecfiy^^y We will need the properties that MyMp = Wy+p forally,y' € GandmaX;cgG |Wj/(3:)| < 
1. One may consult [TVIO, Ch. 4] for a treatment of discrete Fourier analysis tailored to applications 
in additive combinatorics. 

For each value 6 > 0, we define the set 

Specie/) = {y e G : |/(y)| > 6}. 

Say that a subset S c G is covered by a subset A c G if 

S c I ^ £,\A : £a e {-1,0,1} 

UeA 

A subset S c G is d-covered if there exists a subset A c G with |A| < d that covers S. 

Let us define the family 

F = |Re Wy, Im i/j, : y e G| c l}{p). 

Note that ||(p|U < 1 for every cp eF. 

Lemma 3.1. If R is a degree-d Riesz F-product, then SpeCg(R) = {y e G : R(y) + 0} is d-covered. 

Proof. Write R = nf=i(l + £i<Pi) for {ept} c F and {£;} c {-1,0, Ij. For each i, let y, e G be such that 
cpi = Re My; or (p, = ImUy;. Since we can write Re My = ^(My + M_y) and ImMy = ^(My - M_y), upon 
expanding the product defining R, we see that every y E G with R(y) o is a sum of at most d 
elements from the multiset Tq := {yi, ...,yd, -yir ■■ ■, -yd) G G. (We are using the convention here 
that the empty sum is equal to the identity of G in order to handle R(0) + 0.) But we can replace 
To by an actual set T c G as follows: For each i- 1,..., d, if y, occurs t times in Fq, we replace the t 
occurrences of +y/ by the elements {±yi, ±2y,, • • • , +fy,}. □ 
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3.1 Bloom's theorem 


Recall that Ac = {/ : G —> [0, oo) : E^,/ = l|. 

Theorem 3.2 (Bloom). For every f e Aq and 0 < 6 < ^, there exists a subset S c Specg(/) such that 
|S| > 2 |Spec^(/)| and S is d-coveredfor 


£f<24V2— y-+o 

o 


log 3 ^ 

log log I, 


Proof. Setting 77 = 6/(2 V2) and applying Theorem 2.1, there exists a ^ e Ac such that 


N 

y = Yj 

i=l 


with N > 1, Cl,..., C]v > 0, and where Ri,..., Rn are degree-d Riesz ;F-products for d as in (2.2) and 
furthermore \\f - g\\<jr ^ -q . 

Observe that since g e Ac, we have Yn=i c/EpR; = E^^ = 1. Thus we can define a random 
variable Z e {1,2,..., N} so that 

P[Z = i] = C;E^,R;. 

Since \ \f- g\\f- < 7], we deduce that if y € Spec 2 (/)/ then y € Spec iff)■ For such y, we have 


E, 


{u 


y E^Rz 


N 


— C;(E^,R,) 


i=l 


^h" E,,R; 


Rj 

E,,7 


> \{Uy,ff)\ > V2?] 


6 

2 ■ 


Because 



y e,,r, / 


< 1 , we conclude that 


p(R,(y)^0) = P(Kt7,„R,)|>0)>^. 

By linearity, E 2 |SpeCg(Rz)| > § |Spec^(/)|. Moreover, by Lemma 3.1, every set SpecQ(R0 is d-covered. 
Thus there exists at least one such set that completes the proof of the theorem. □ 


3.2 Chang's theorem 

Theorem 3.3 (Chang). For every f & Ac and 6 > 0, the set Spec^(/) is d-coveredfor 


d < 4 


Ent^,(/) 

62 


Note that Theorem 2.1 implies there is a density g e Ac such that SpeCg(/) c SpecQ(^) and 
from ( 2 . 1 ), one can write gix) = 4’i^yi i^)> • • • / ^yki^)) for some function ip and yi, ..., yjt G G with 
k ^ 0(Ent^,(/)/6^). In the special case G = P^, this implies that 


Speco(^) c spanjp (yi,..., yjt) 



C-e {-1,0,1} 


yielding Theorem 3.3 for G = P”. Eor general finite abelian G, this no longer holds, and one obtains 
instead the following statement. 
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Lemma 3.4. For every f e Ac and 0 < 6 < ^, there is aset AqG with 


|A| < 18 


Ent^,(/) 

62 


and such that every element y € Specg(/) can he written 


d 

y = GYi, {yi,---,7d) e A‘^,ei,...,ed e {-1,0,1}- 

1=1 


with 


d<12V2-^ + 0 

6 


log 3 ^ 
log log I, 


This should be compared to [Shk06, Thm. 4] which achieves a worse bound on |A| but the 
significantly better bound d < 0(Ent^,(/)). 


Proof. As in the proof of Theorem 3.2, set r] = bj V2 and apply Theorem 2.1 to obtain a density 
g = Tjf=i CiRi where each R/ is a degree-d Riesz !F-product with d as in (2.2). Now y e Specie/) 
implies y e SpecQ(^), which means that y € SpecQ(R/) for some i = 1,... ,N. 

To conclude, observe that every element of SpecQ(R0 can be written as Yj‘i=i ^iYi for some tuple 
(71 ,..., yd) e A‘^ (recall the proof of Lemma 3.1). □ 


In order to prove Theorem 3.3 for general G, we recall the following definition. Say that a 
subset A c G is disassociated if 

£y7 = 0 and jgy} c {-1,Q, 1} => = 0 VyeA. 

If A c Specg(/) is a maximal disassociated subset, then Spec^{f) is covered by A. Thus the following 
lemma finishes the proof of Theorem 3.3. The argument is based on a a proof of Rudin's inequality 
credited to I. Z. Ruzsa in [Gre04]. 


Lemma 3.5. If Ac Spec^(/) is disassociated, then 


|AK4 


Ent^,(/) 

62 


Proof. Let Ti = (ReMy : 7 € A|,;T2 = (ImMy : 7 e A}. 

Claim 3.6. The families and T 2 are Laplace pseudorandom. 
Given Claim 3.6, we have 


IA|6^ < y (/, cp}^ < 4 Enty(/), 

(peTiUT2 


where the first inequality follows from A c Spec^(/) and the second is Lemma 2.2. 

So let us turn to the proof of Claim 3.6. We prove it for Ti as the proof for T 2 is essentially 
identical. We require the following two basic facts: Eor any f € R and x e [-1,1], 




e* + e ^ 


o-f 


+ X- 


= cosh(f) + X smh(f), 


(3.1) 
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(3.2) 



The first uses the fact that x i-> is convex. 
Now write 



V<peTi 


ipeTi 


(3.3) 


Recalling that every cp e Ti is of the form cp = Re Uy = \{Uy + U-y) for some y € A, we see that the 
right-hand side of (3.3) breaks into a linear combination of characters such that 



yeA 


But ^y[Ua] = 0 unless a = 0. By disassociativity of A, this can only happen if Ey = 0 for all y e A. 
In particular, we conclude that 



implying that 'Fy is Laplace pseudorandom and completing the argument. 


□ 
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